Temporal stratification of amyotrophic lateral sclerosis patients using disease progression patterns

Identifying groups of patients with similar disease progression patterns is key to understand disease heterogeneity, guide clinical decisions and improve patient care. In this paper, we propose a data-driven temporal stratification approach, ClusTric, combining triclustering and hierarchical clustering. The proposed approach enables the discovery of complex disease progression patterns not found by univariate temporal analyses. As a case study, we use Amyotrophic Lateral Sclerosis (ALS), a neurodegenerative disease with a non-linear and heterogeneous disease progression. In this context, we applied ClusTric to stratify a hospital-based population (Lisbon ALS Clinic dataset) and validate it in a clinical trial population. The results unravelled four clinically relevant disease progression groups: slow progressors, moderate bulbar and spinal progressors, and fast progressors. We compared ClusTric with a state-of-the-art method, showing its effectiveness in capturing the heterogeneity of ALS disease progression in a lower number of clinically relevant progression groups.


Stratification using static features and three appointments
Static features are considered in the similarity matrix by computing 2D patterns through biclustering as explained in the Methods section.Figure S1 shows the progression groups and respective disease progression trajectories found by ClusTric when considering static features and the first three consecutive appointments of patient follow-up.Figure S2 presents the same information when static features are used.
These results elucidate the following findings: • When considering the inclusion of static patterns together with temporal ones, the best partition according the used criteria leads to three clusters (contrasting with the four obtained without considering static patterns, shown in Fig. 1).This prevents the finding of the Moderate Progressors mainly spinal group.• When considering only the static patterns, the best partition leads to seven groups, but without coherent disease progressions, highlighting that static patterns reflect the high patient heterogeneity but are insufficient to capture disease progression.S -Silhouette (higher is better) CH -Calinski-Harabasz (higher is better) DB -Davies-Bouldin (lower is better)    (C) average temporal feature trajectories (lines) and 95% confidence intervals (shades around lines).The numbers next to each point in the trajectories represent the number of patients in the <cluster,appointment> set, whereas the numbers between consecutive appointments indicate the average slope between consecutive measurements in a cluster.Source data are provided as a Source Data file.

Stratification using first appointment
We performed experiments considering the features usually assessed at patients' follow-up (Table 2) with and without the static features (Table 1).Figure S3 presents the progression groups and respective disease progression trajectories found by Clus-Tric without static features, while Figure S4 depicts the same information but with static features.From these results, we highlight the following findings: • ClusTric found three groups when considering the ALSFRS scores at first appointment, with disjoint temporal coherent trajectories.• The inclusion of static features proved again the higher heterogeneity of patients when compared with the less heterogeneity found by the ALSFRS functional scale.12,[11][12][11][12][11][12][9][10][11][12]Std 11.38,1.14 11.06,1.64 11.33,1.30 10.20,2.27Table S3: Pairwise comparisons of survival curves of Figure 4. Log-rank test was used to test differences between pairwise survival curves for MoGP and ClusTric.

Fig. S2 :
Fig. S2: Experiment with ClusTric using ONLY static features -Cluster analysis and characterization on Lisbon ALS cohort.(A) Dendrogram resulting from ClusTric; (B) corresponding evaluation scores obtained with 3-7 clusters; and(C) average temporal feature trajectories (lines) and 95% confidence intervals (shades around lines).The numbers next to each point in the trajectories represent the number of patients in the <cluster,appointment> set, whereas the numbers between consecutive appointments indicate the average slope between consecutive measurements in a cluster.Source data are provided as a Source Data file.

Fig. S1: Experiment with ClusTric using static features and first 3 consec- utive appointments -Cluster analysis and characterization on Lisbon ALS cohort.
S -Silhouette (higher is better) CH -Calinski-Harabasz (higher is better) DB -Davies-Bouldin (lower is better)

Experiment with ClusTric using ONLY first appointment data -Cluster analysis and characterization on Lisbon ALS cohort.
The numbers next to each point in the trajectories represent the number of patients in the <cluster,appointment> set, whereas the numbers between consecutive appointments indicate the average slope between consecutive measurements in a cluster.Source data are provided as a Source Data file

Table S1 : Characterization of the population used in the case study based on the temporal data.
Characterization of the population in the PRO-ACT dataset and the Lisbon ALS Clinic dataset based on measurements from the 1st and 3rd visits.The features are described using Median, Interquartile Range (IQR), Average, and Standard deviation (Std).

Table S4 : Pairwise comparisons of survival curves of Figure 5.
Log-rank test was used to test differences between pairwise survival curves of patients coming from SP group.